Skip to content

BaconDecomposition R parity goldens#457

Merged
igerber merged 13 commits into
mainfrom
feature/bacon-r-parity-goldens
May 16, 2026
Merged

BaconDecomposition R parity goldens#457
igerber merged 13 commits into
mainfrom
feature/bacon-r-parity-goldens

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 16, 2026

Summary

  • Closes the PR BaconDecomposition methodology audit (Goodman-Bacon 2021) #454 deferred R parity follow-up (TODO.md row removed).
  • Generated benchmarks/data/r_bacondecomp_golden.json from the committed benchmarks/R/generate_bacon_golden.R script against bacondecomp 0.1.1 on R 4.5.2 (3 DGP fixtures).
  • tests/test_methodology_bacon.py::TestBaconParityR now active (3/3 pass, no skips): TWFE coefficient parity + weights-sum parity at atol=1e-6 across all 3 fixtures; per-component estimate + weight parity at atol=1e-6 on the 2 non-remap fixtures.
  • Documented one structural convention divergence on always_treated_remapped: R keeps first_treat=1 as a distinct timing cohort (Later vs Always Treated rows); Python's paper-footnote-11 convention remaps those units to U and folds them into a single treated_vs_never cell per treated cohort. Aggregate is invariant per Theorem 1; per-component breakdown differs. Per-component test skipped on this fixture with explicit documentation; aggregate parity locked.
  • Tracker promotion: METHODOLOGY_REVIEW.md status row → **Complete** (was **Complete** (R parity pending)). Removed from In Progress prose mention + Priority Order list.

Methodology references (required if estimator / math changes)

  • Method name(s): BaconDecomposition (cross-language R parity validation; no algorithm changes)
  • Paper / source link(s): Goodman-Bacon (2021), J. Econometrics 225(2), 254-277. R reference: bacondecomp::bacon() (CRAN).
  • Any intentional deviations from the source (and why): One R-vs-Python convention divergence documented in new REGISTRY **Note (R parity convention divergence on always-treated)**. The aggregate TWFE coefficient + weights-sum match R at atol=1e-6; only the per-component U-bucket decomposition differs (R splits always-treated as separate type; Python remaps to U per paper footnote 11). Theorem 1's identity is invariant to this re-bucketing.

Validation

  • Tests added/updated: tests/test_methodology_bacon.py (per-component test now skips the always_treated_remapped fixture with explicit reason; aggregate tests unchanged). 33/33 in methodology bacon file (was 30+3 skipped); 32 in test_bacon.py; 101 across broader bacon/decompose surface (was 98+3 skipped).
  • Backtest / simulation / notebook evidence (if applicable): N/A (no behavior change; goldens + tracker promotion only).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment
✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weighting, or variance code; it adds R goldens, activates parity tests, and updates tracker/docs.
  • The new parity evidence is internally consistent: the committed goldens include the Later vs Always Treated rows on always_treated_remapped, and the PR documents that structural R/Python difference instead of treating it as a defect.
  • Promoting Bacon from “R parity pending” to **Complete** and removing the Bacon TODO row is supported by the added JSON artifact and active parity coverage.
  • Only minor P3 items remain: one public docstring now overstates universal R parity, and one test skip message still references the removed TODO deferral.
  • Verification note: I could not execute tests in this environment because pytest and pandas are not installed; assessment is based on diff/static inspection.

Methodology

  • P3 Informational — docs/methodology/REGISTRY.md:2661 and tests/test_methodology_bacon.py:397. Impact: the only cross-language mismatch in scope is the always-treated component breakdown, and the PR handles it correctly under the review policy by documenting it in the Registry and skipping only the non-comparable per-component assertion. Concrete fix: none required.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Removing the Bacon parity row from TODO.md is justified by the committed goldens and activated parity checks.

Security

  • No findings.

Documentation/Tests

  • P3diff_diff/bacon.py:1304. Impact: the new example comment says the default path “matches R bacondecomp::bacon() at atol=1e-6” without mentioning the documented always-treated convention exception, so the public docstring is broader than the actual parity contract. Concrete fix: qualify the sentence with the same exception already captured in REGISTRY.md.
  • P3tests/test_methodology_bacon.py:301. Impact: if the golden JSON is absent in a partial checkout or packaging scenario, the skip message still says parity is “deferred” and points to TODO.md, but that deferral row has been removed by this PR. Concrete fix: update the skip text to point only to benchmarks/R/generate_bacon_golden.R / benchmarks/data/r_bacondecomp_golden.json and remove the TODO reference.

igerber added a commit that referenced this pull request May 16, 2026
R1 verdict was Looks good with 2 P3 informational items. Both addressed:

1. P3 (Documentation/Tests): `bacon_decompose()` docstring example
   said "matches R bacondecomp::bacon() at atol=1e-6" without mentioning
   the documented always-treated convention exception. Qualified the
   example to spell out the aggregate-vs-per-component split:
   aggregate parity holds for all panels at atol=1e-6, per-component
   parity holds when first_treat is bounded below by min(time) (no
   always-treated), and the divergence on always-treated panels is by
   convention (Python remap-to-U vs R's `Later vs Always Treated`).
   Cross-references the REGISTRY note for the full contract.

2. P3 (Documentation/Tests): `TestBaconParityR`'s skip message still
   said the goldens were "deferred until R is provisioned (see TODO.md)"
   but the TODO row was removed in this PR. Updated to describe the
   intended skip case (partial-checkout / packaging scenarios where
   the committed JSON is unavailable) and dropped the TODO reference.

Tests unchanged: 33/33 pass in test_methodology_bacon.py.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 00ee515275bd3fe2026ba2f3612f8e58da336bba


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weighting, variance, identification checks, or defaults; it adds committed R goldens, activates parity coverage, and updates Bacon docs/trackers.
  • Re-review status: the two prior P3s are resolved. The public docstring now scopes R parity correctly at diff_diff/bacon.py:1304-L1313, and the missing-golden skip text no longer points to the removed TODO deferral at tests/test_methodology_bacon.py:301-L309.
  • Methodology cross-check is consistent with the paper/registry: always-treated units belong in U under paper footnote 11, while R keeps a separate Later vs Always Treated breakdown; the PR documents that convention difference and limits per-component parity to the comparable fixtures at docs/methodology/REGISTRY.md:2661-L2676, docs/methodology/papers/goodman-bacon-2021-review.md:171-L197, and tests/test_methodology_bacon.py:356-L415.
  • One minor documentation nit remains: the Bacon tracker still says R parity “activates once goldens are committed” even though the goldens are committed and the tests are active at METHODOLOGY_REVIEW.md:932.
  • Verification note: I could not execute tests/test_methodology_bacon.py because pytest is not installed in this environment.

Methodology

  • Severity: P3 informational. Impact: the only R/Python mismatch in scope is the always-treated convention difference, and it is explicitly documented in the Registry as a deviation from R while keeping aggregate parity locked across all three fixtures; under the review policy this is not a defect. Concrete fix: none required. Refs: docs/methodology/REGISTRY.md:2673-L2676, tests/test_methodology_bacon.py:321-L415.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Removing the Bacon R-parity TODO row is justified by the committed golden artifact and active parity coverage.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: METHODOLOGY_REVIEW.md:932 still reads as if R parity will activate only after future golden-commit work, which is stale after this PR and mildly confusing for tracker readers. Concrete fix: change that line to say the 33 methodology tests are all active, including the committed R-parity tests.
  • No defect finding on the prior re-review items: diff_diff/bacon.py:1304-L1313 and tests/test_methodology_bacon.py:301-L309 now reflect the intended contract.

igerber added a commit that referenced this pull request May 16, 2026
R2 verdict was Looks good with 1 P3 informational item. METHODOLOGY_REVIEW.md
Test Coverage line read "all active; R parity activates once goldens are
committed" - stale after this PR commits the goldens and activates the
3 R-parity tests. Updated to reflect the post-PR state: all 33 tests
active including R-parity (with pointer to the committed JSON).
igerber and others added 3 commits May 16, 2026 14:44
Closes the PR #454 deferred R parity follow-up (TODO.md row removed).

Generated `benchmarks/data/r_bacondecomp_golden.json` from the committed
`benchmarks/R/generate_bacon_golden.R` script against `bacondecomp 0.1.1`
on R 4.5.2. Three DGP fixtures: `uniform_3groups_with_never_treated`,
`two_groups_no_never_treated`, `always_treated_remapped`.

Parity results at atol=1e-6 via `tests/test_methodology_bacon.py::TestBaconParityR`:
- TWFE coefficient: ✅ matches across all 3 fixtures
- Weights-sum:      ✅ matches across all 3 fixtures
- Per-component:    ✅ on the 2 non-remap fixtures; **structural convention
  divergence** on `always_treated_remapped` (skipped per-component, kept
  aggregate). R keeps `first_treat=1` as a distinct timing cohort and
  emits `Later vs Always Treated` comparisons; Python's paper-footnote-11
  convention remaps those units to `U` and folds them into a single
  `treated_vs_never` cell per treated cohort. The aggregate is invariant
  per Theorem 1 — the U bucket's weight is re-allocated across nested
  2x2 cells but the total weight on {cohort_k vs U} is identical. Only
  the per-component breakdown differs structurally between conventions.

Tracker promotions:
- METHODOLOGY_REVIEW.md: BaconDecomposition status row → **Complete**
  (was `**Complete** (R parity pending)`); removed from In Progress
  prose mention; removed from Priority Order substantive-review list;
  Test Coverage count refreshed (24 → 33); R Comparison Results block
  rewritten as **Validated**.
- docs/methodology/REGISTRY.md: Reference Implementations bullet + Verified
  Components checklist + Note (weight modes) updated; new Note (R parity
  convention divergence on always-treated) documents the convention.
- TODO.md: BaconDecomposition R parity goldens row removed.
- CHANGELOG.md: new `[Unreleased]` Added bullet for the close-out;
  PR-B Changed entry tightened ("intended to match" → "matching ... at
  atol=1e-6").
- diff_diff/bacon.py: `bacon_decompose` docstring example wording
  tightened from "intended to match" to "matches" with TestBaconParityR
  pointer.

Tests: 33/33 pass in test_methodology_bacon.py (no skips; was 30+3
skipped); 32 pass in test_bacon.py; 101 pass across the broader
bacon/decompose surface (was 98+3 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 verdict was Looks good with 2 P3 informational items. Both addressed:

1. P3 (Documentation/Tests): `bacon_decompose()` docstring example
   said "matches R bacondecomp::bacon() at atol=1e-6" without mentioning
   the documented always-treated convention exception. Qualified the
   example to spell out the aggregate-vs-per-component split:
   aggregate parity holds for all panels at atol=1e-6, per-component
   parity holds when first_treat is bounded below by min(time) (no
   always-treated), and the divergence on always-treated panels is by
   convention (Python remap-to-U vs R's `Later vs Always Treated`).
   Cross-references the REGISTRY note for the full contract.

2. P3 (Documentation/Tests): `TestBaconParityR`'s skip message still
   said the goldens were "deferred until R is provisioned (see TODO.md)"
   but the TODO row was removed in this PR. Updated to describe the
   intended skip case (partial-checkout / packaging scenarios where
   the committed JSON is unavailable) and dropped the TODO reference.

Tests unchanged: 33/33 pass in test_methodology_bacon.py.
R2 verdict was Looks good with 1 P3 informational item. METHODOLOGY_REVIEW.md
Test Coverage line read "all active; R parity activates once goldens are
committed" - stale after this PR commits the goldens and activates the
3 R-parity tests. Updated to reflect the post-PR state: all 33 tests
active including R-parity (with pointer to the committed JSON).
@igerber igerber force-pushed the feature/bacon-r-parity-goldens branch from a9e3c64 to 86facdd Compare May 16, 2026 18:45
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 86facdd92c138022c81e1d399340ff70ef7a9d77


Overall assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weights, variance, assumptions, or defaults; it commits R goldens, activates Bacon parity coverage, and updates methodology/tracker docs.
  • The prior re-review items are resolved: the public bacon_decompose() docstring now scopes R parity correctly at diff_diff/bacon.py:L1304-L1313, and the missing-golden skip text now reflects committed artifacts at tests/test_methodology_bacon.py:L301-L309.
  • The only Python/R mismatch in scope is the always-treated component breakdown on always_treated_remapped; that divergence is explicitly documented in docs/methodology/REGISTRY.md:L2686-L2686 and METHODOLOGY_REVIEW.md:L929-L939, so under the stated policy it is informational rather than a defect.
  • Minor remaining gap: the parity suite skips direct component-level validation for always_treated_remapped, so the documented R-to-Python fold-back claim is not directly asserted in tests at tests/test_methodology_bacon.py:L401-L415.
  • Verification note: I could not execute the Bacon methodology tests locally because this environment does not have pytest or pandas installed.

Methodology

  • P3 informational. Impact: the PR’s only cross-language deviation is the always-treated convention difference, and it is now explicitly documented as a Registry note with aggregate parity still locked across all three fixtures, which makes it mitigated under the review policy. Concrete fix: none required. Refs: docs/methodology/REGISTRY.md:L2670-L2686, METHODOLOGY_REVIEW.md:L929-L939, tests/test_methodology_bacon.py:L401-L415.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3. Impact: TestBaconParityR.test_component_estimates_match_r() fully skips the always_treated_remapped fixture, so the suite does not directly pin the documented structural claim that R’s split Later vs Always Treated + Treated vs Untreated rows collapse to Python’s single treated_vs_never row per cohort. A cohort-level regression in that fold could slip through if overall TWFE parity still holds. Concrete fix: add a fixture-specific assertion for always_treated_remapped that aggregates R’s always-treated and untreated rows by treated cohort, then compares the resulting combined weight and weighted-average estimate to Python’s treated_vs_never component for that cohort. Refs: tests/test_methodology_bacon.py:L356-L415, docs/methodology/REGISTRY.md:L2686-L2686.

R3 verdict was Looks good with 1 P3 informational item. The per-component
parity test skips the `always_treated_remapped` fixture (R/Python decompose
the U bucket differently by convention), and the REGISTRY note documents
that aggregating R's `Later vs Always Treated` + `Treated vs Untreated`
rows by treated cohort should match Python's single `treated_vs_never`
component for that cohort. The reviewer flagged that the documented
structural claim was not directly asserted in tests — a cohort-level
regression in the fold-back could slip through under overall TWFE parity.

Per memory `feedback_test_coverage_gap_treat_as_actionable`, the
"test exists but doesn't directly exercise the documented surface"
P3 is actionable. Added `test_always_treated_remapped_fold_back_matches_r`
to `TestBaconParityR`: for each treated cohort in the remap fixture,
aggregate R's `Later vs Always Treated` + `Treated vs Untreated` rows
by combined weight and weight-averaged estimate, then assert both
match Python's `treated_vs_never` component for that cohort at
atol=1e-6. Currently passes — confirms the documented structural
fold-back is exact at numerical precision.

Tests: 34/34 pass in test_methodology_bacon.py (was 33; +1 new
regression).
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 780d50287c1f7287786c6719eabb8edeb4da3740


Overall assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator logic, weights, variance, assumptions, or defaults; it commits R parity goldens, updates the methodology/tracker text, and expands Bacon parity coverage in tests/test_methodology_bacon.py:L321-L534, docs/methodology/REGISTRY.md:L2670-L2686, and METHODOLOGY_REVIEW.md:L929-L939.
  • The prior re-review gap is substantially addressed: TestBaconParityR.test_always_treated_remapped_fold_back_matches_r() now directly asserts the documented R→Python fold-back for the always-treated U bucket at the treated-cohort level in tests/test_methodology_bacon.py:L468-L534.
  • The Python/R always-treated divergence is now explicitly documented in the Methodology Registry, so under the stated policy it is informational rather than a defect: docs/methodology/REGISTRY.md:L2683-L2686.
  • Residual gap only at P3: the blanket skip for always_treated_remapped still leaves that fixture’s unaffected timing-vs-timing rows without direct per-component parity assertions.
  • Verification note: I could not execute the Bacon methodology tests locally because this environment lacks pytest, numpy, and pandas.

Methodology

  • No findings. The only cross-language mismatch in scope is the always-treated convention difference, and it is explicitly documented as a Registry note rather than an undocumented deviation: docs/methodology/REGISTRY.md:L2683-L2686, METHODOLOGY_REVIEW.md:L935-L939.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 Impact: test_component_estimates_match_r() still skips the entire always_treated_remapped fixture, while the new replacement test only checks the aggregated U-bucket collapse. The committed golden for that fixture still contains six unaffected timing-vs-timing rows, so a regression in those components would be caught only indirectly through aggregate parity checks. Concrete fix: on always_treated_remapped, keep direct parity assertions for the six Earlier/Later vs Treated keys and reserve the special fold-back logic only for the U-bucket rows. Refs: tests/test_methodology_bacon.py:L401-L415, tests/test_methodology_bacon.py:L468-L534, benchmarks/data/r_bacondecomp_golden.json:L124-L205.

R4 verdict was Looks good with 1 P3 informational item: the per-component
parity test skipped the ENTIRE always_treated_remapped fixture, leaving
the 6 timing-vs-timing rows (Earlier/Later vs Earlier/Later Treated
between cohorts 3/4/5) without direct per-component parity assertions.
Per memory feedback_test_coverage_gap_treat_as_actionable, this is the
"test exists but doesn't directly exercise the surface" pattern and
should be actionable.

Narrowed the carve-out: instead of skipping the whole fixture, drop only
the treated_vs_never keys from both Python and R sides (the actual
U-bucket convention divergence), and keep direct atol=1e-6 parity
assertions on the 6 timing-vs-timing keys. Also refined _classify_r_type
to canonicalize R's "Later vs Always Treated" type string to
treated_vs_never (Python folds those rows into the U bucket per paper
footnote 11, so they belong to the U comparison set semantically even
though R numbers them by the always-treated cohort), keeping the
narrow carve-out simple.

Tests: 34/34 pass in test_methodology_bacon.py (+6 directly asserted
timing-vs-timing comparisons in the remap fixture vs prior coverage).
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8225ba06567f823ee0624cfcec1cd93b08cf79cd


Overall Assessment

Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, variance, identification checks, or defaults; it adds committed R goldens, updates the parity harness, and promotes the tracker/docs.
  • The prior re-review gap is addressed: always_treated_remapped no longer blanket-skips per-component coverage. The test now keeps the six timing-vs-timing rows under direct parity and adds a separate fold-back assertion for the documented U-bucket convention difference in tests/test_methodology_bacon.py and tests/test_methodology_bacon.py.
  • The Python/R always-treated difference remains explicitly documented in the Methodology Registry, so under the stated policy it is informational rather than a defect in methodology adherence: docs/methodology/REGISTRY.md.
  • One new P3 issue: several docs still describe the old test shape and stale counts after the new fold-back test landed.
  • I could not execute the suite here because this environment lacks pytest and pandas.

Methodology

No findings. The only methodology deviation in scope is the always-treated R/Python convention split, and it is documented in docs/methodology/REGISTRY.md and docs/methodology/REGISTRY.md.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • P3 Impact: the prose is now out of sync with the actual parity harness. CHANGELOG.md, REGISTRY.md, and METHODOLOGY_REVIEW.md still say TestBaconParityR has 3 tests and that always_treated_remapped is skipped for per-component parity, but the code now adds a fourth fold-back test and only carves out treated_vs_never rows while still directly checking the six timing-vs-timing rows. The in-test docstring is stale in the same way. Concrete fix: update the release/tracker text and the test docstring to say that TestBaconParityR now has 4 tests, the methodology file count increased by one, and only the U-bucket rows are excluded from direct per-component matching on always_treated_remapped. Refs: CHANGELOG.md, docs/methodology/REGISTRY.md, METHODOLOGY_REVIEW.md, METHODOLOGY_REVIEW.md, tests/test_methodology_bacon.py, tests/test_methodology_bacon.py.

R5 verdict was Looks good with 1 P3 informational item: docs prose
out of sync with the actual parity harness after R4's carve-out
narrowing. CHANGELOG, REGISTRY, METHODOLOGY_REVIEW, and the fold-back
test's own docstring still said TestBaconParityR has 3 tests and
that always_treated_remapped is skipped for per-component parity,
while the code now adds a fourth fold-back test and only carves out
treated_vs_never rows while keeping direct parity on the 6 timing-
vs-timing rows of that fixture.

Refreshed 6 surfaces:
- METHODOLOGY_REVIEW.md Verified Components checklist + Test Coverage
  count (33 -> 34) + R Comparison Results subsection for the remap
  fixture.
- docs/methodology/REGISTRY.md Reference Implementations bullet,
  Requirements checklist, and Note (R parity convention divergence)
  text to reflect the narrowed carve-out and the fold-back test.
- CHANGELOG.md PR-457 Added entry (4 tests, narrowed carve-out
  description, fold-back test mention).
- tests/test_methodology_bacon.py::test_always_treated_remapped_fold_back_matches_r
  docstring (no longer says the per-component test "skips" the
  fixture; says it carves out only the U-bucket rows).

Tests: 34/34 pass.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8f504bdf0586d8b721de0da7a9a225aca3e33d6f


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, variance/SE, identification checks, or defaults; the diff_diff/bacon.py change is docstring-only at diff_diff/bacon.py:1304.
  • The methodology-facing work is the R-parity validation surface: committed goldens for 3 fixtures, direct per-component parity on the 2 non-remap fixtures plus the 6 timing-vs-timing rows of always_treated_remapped, and a separate fold-back assertion for the documented always-treated convention split in tests/test_methodology_bacon.py:356, tests/test_methodology_bacon.py:481, and docs/methodology/REGISTRY.md:2686.
  • The prior re-review documentation gap is mostly resolved: REGISTRY.md, METHODOLOGY_REVIEW.md, TODO.md, and the parity test prose now reflect committed goldens and a 4-test active parity harness at docs/methodology/REGISTRY.md:2671, METHODOLOGY_REVIEW.md:929, TODO.md:73, and tests/test_methodology_bacon.py:301.
  • One residual P3 remains: the older Bacon audit paragraph in CHANGELOG.md still says the JSON goldens are pending and parity tests skip, which now conflicts with the new committed-goldens entry in the same file.
  • I could not run the parity tests locally because this environment lacks pandas.

Methodology

No findings. The only deviation in scope is the always-treated R/Python decomposition convention, and it is explicitly documented in the registry note, so under the stated review policy it is informational rather than a defect at docs/methodology/REGISTRY.md:2681 and docs/methodology/REGISTRY.md:2686.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon parity deferral from TODO.md is consistent with the newly committed golden fixture file.

Security

No findings.

Documentation/Tests

  • P3 Impact: CHANGELOG.md still contains a stale Bacon audit paragraph saying the JSON goldens are deferred until local R provisioning and that TestBaconParityR skips when missing. That now contradicts the new top-level changelog entry, the committed golden file, and the active 4-test parity harness, so the release notes remain internally inconsistent. Concrete fix: update or trim the older PR-B audit bullet so it no longer describes the pre-goldens state; the current truth is already captured in CHANGELOG.md:11, but contradicted by CHANGELOG.md:13.

…e-out

R6 verdict was Looks good with 1 P3 informational item: the older PR-B
audit bullet at CHANGELOG.md:13 (added in PR #454) still described
the pre-goldens deferral state ("JSON goldens deferred", "TestBaconParityR
skips with a pointer when goldens missing", "status flipped to
**Complete (R parity goldens pending)**"). That contradicts the new
PR-457 bullet at CHANGELOG.md:11 (committed goldens + 4 active parity
tests) within the same [Unreleased] section, so the release notes
read as internally inconsistent.

Updated 3 strings in the PR-B bullet to reflect the within-release
close-out:
- Status flip wording: now says the (R parity pending) caveat was
  closed by the parity-goldens bullet above in this same release.
- TestBaconParityR description: 4 tests, all active post-release;
  skips only in partial-checkout scenarios.
- (4) outcome: parity goldens deferral was closed within this release.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1bd8b0efa3017bb93408f3d7ff0ebc62120e35fd


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math, weights, SEs, or defaults; it adds committed R-parity goldens, tightens parity assertions around the always-treated fixture, and promotes the Bacon methodology surface to **Complete**.
  • The prior AI-review P3 about stale Bacon changelog text appears resolved: the TODO deferral row is removed, the tracker status is updated, and the new parity tests/docs are wired together consistently.
  • [Newly identified] The authoritative methodology contract still overstates source adherence on the always-treated boundary case. REGISTRY.md now presents first_treat <= min(time) as “per paper footnote 11”, but the repo’s own methodology prose says the paper’s strict shorthand is t_i < 1 and that folding first_treat == min(time) into U is a library boundary convention.
  • The new R-parity coverage is otherwise well targeted: it now directly checks the 6 timing-vs-timing rows on always_treated_remapped and separately pins the documented R→Python U-bucket fold-back.
  • I could not execute tests/test_methodology_bacon.py locally because this environment lacks pytest, numpy, and pandas.

Methodology

  • [Newly identified] Severity: P1. docs/methodology/REGISTRY.md:L2619-L2619 and docs/methodology/REGISTRY.md:L2684-L2686 describe remapping first_treat <= min(time) to U as “per paper footnote 11”, and diff_diff/bacon.py:L473-L477 repeats that framing. But the repo’s own methodology prose distinguishes the paper’s strict t_i < 1 rule from the library’s broader <= min(time) boundary convention at tests/test_methodology_bacon.py:L589-L600, and METHODOLOGY_REVIEW.md:L944-L944 likewise contrasts t_i < 1 with the implemented <= min(time) rule. With this PR removing the Bacon TODO and promoting the surface to **Complete**, the authoritative REGISTRY still does not label that boundary extension as a deviation/library convention.

    Impact: users reading the Bacon methodology contract are told this boundary behavior is paper-faithful when the repo’s own methodology notes say otherwise. That is an undocumented methodology deviation under the review rubric, specifically on first-period-treated cohorts such as the new always_treated_remapped parity fixture.

    Concrete fix: add an explicit REGISTRY note/deviation stating that the paper puts t_i < 1 in U, while Python intentionally also folds first_treat == min(time) into U as a boundary convention because those units have no untreated observations in-panel; then align the matching wording in diff_diff/bacon.py, CHANGELOG.md, and the Bacon methodology review text.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon R-parity TODO row is consistent with the committed golden file and the new parity-test surface.

Security

No findings.

Documentation/Tests

  • Severity: P3. CHANGELOG.md:L13-L13 still describes tests/test_methodology_bacon.py as “~24 tests across 6 classes”, while the same PR updates METHODOLOGY_REVIEW.md:L931-L933 to 34 methodology tests and the file now contains 34 test methods.

    Impact: minor release-note inconsistency about the size of the active validation surface.

    Concrete fix: update the changelog phrase to 34 tests or explicitly scope ~24 as the audit-time count rather than the current post-goldens state.

Execution note: I was unable to run the Bacon methodology suite in this environment because pytest, numpy, and pandas are unavailable.

Path to Approval

  1. Add an explicit REGISTRY note/deviation for the first_treat == min(time) boundary convention, and mirror that wording in the Bacon docstring/changelog text so the promoted BaconDecomposition methodology surface no longer overclaims paper-faithfulness on that case.

R7 surfaced a real P1: the REGISTRY presented the always-treated remap
condition `first_treat <= min(time)` as "per paper footnote 11", but
the paper's strict rule is `t_i < 1` (units treated *before* the first
observable period). The inclusive `<= min(time)` rule additionally
folds `first_treat == min(time)` cohorts into U — that's a library
boundary convention, not a paper-faithful rule. The test class
docstring already called this out, but the authoritative REGISTRY
contract did not, which read as an undocumented methodology deviation
on PRs that promote BaconDecomposition to **Complete**.

Resolution: added a new explicit `**Deviation (first-period boundary
extension on always-treated remap)**` block to REGISTRY's Bacon section
that:
- Names the paper's strict `t_i < 1` rule
- States the library's inclusive `<= min(time)` rule
- Explains the rationale (`first_treat == min(time)` cohorts have no
  untreated cell in-panel)
- Notes R does NOT apply this fold (it keeps such cohorts in their own
  bucket and emits `Later vs Always Treated`)
- Notes the rules coincide when `min(time) > 1`

Mirrored in:
- REGISTRY Assumption checks bullet (line 2619): now points at the
  new Deviation block
- REGISTRY `**Note (always-treated remap)**` (line 2684): qualifies
  the "per paper footnote 11" claim
- METHODOLOGY_REVIEW.md Deviations block: re-titled to include
  paper deviations, added the boundary entry as item 1
- `bacon_decompose()` docstring (`bacon.py:467-487`): explicit
  boundary-extension paragraph with REGISTRY pointer
- CHANGELOG PR-457 Added entry: explicit boundary-deviation callout

Also fixes R7 P3: CHANGELOG PR-B test count "~24 tests" updated to
acknowledge the post-release 34-test count after R-parity-goldens
expansion.

Tests: 34/34 pass.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2a2c8813ad53c7189fbb99fb311dff6558f7e135


Overall Assessment

Looks good — no unmitigated P0 or P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math; it commits R-parity goldens, activates parity tests, and promotes the methodology tracker/docs.
  • Re-review: the prior P1 on the first-period boundary convention is resolved. The authoritative methodology contract now explicitly labels first_treat == min(time) remap as a library deviation from the paper/R behavior in docs/methodology/REGISTRY.md:L2619-L2621 and docs/methodology/REGISTRY.md:L2684-L2687, mirrored in diff_diff/bacon.py:L473-L486 and METHODOLOGY_REVIEW.md:L946-L947.
  • The new R-parity surface is internally consistent: direct per-component parity is now claimed only for the unaffected timing-vs-timing rows of always_treated_remapped, while the documented U-bucket convention difference is tested separately via fold-back aggregation in tests/test_methodology_bacon.py:L441-L549.
  • Removing the Bacon parity deferral row from TODO.md is supported by the committed goldens and the active TestBaconParityR coverage.
  • One minor P3 remains: the METHODOLOGY_REVIEW.md priority list now skips item 13 after Bacon’s removal.
  • I could not execute the test suite locally because this environment lacks pytest, numpy, and pandas.

Methodology

No findings. The prior undocumented-deviation issue is resolved by the new explicit deviation language in docs/methodology/REGISTRY.md:L2619-L2621 and docs/methodology/REGISTRY.md:L2684-L2687, plus the matching docstring update in diff_diff/bacon.py:L473-L486.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Removing the Bacon R-parity deferral from TODO.md is justified by the committed benchmarks/data/r_bacondecomp_golden.json fixture and the now-active parity tests.

Security

No findings. I did not see any secrets or sensitive data concerns in the added golden file.

Documentation/Tests

  • Severity: P3. Impact: METHODOLOGY_REVIEW.md’s “Priority Order” list now jumps from 12 to 14, which makes the tracker look stale after Bacon’s removal and slightly reduces scanability. Concrete fix: renumber Survey Data Support from 14. to 13. in METHODOLOGY_REVIEW.md:L1218-L1225.
  • Execution note: I could not run tests/test_methodology_bacon.py locally because pytest, numpy, and pandas are not installed in this environment.

R8 verdict was Looks good with 1 P3 informational item: the
METHODOLOGY_REVIEW.md Priority Order list jumped from item 12 to
item 14 after PR #457 removed Bacon (the prior item 1). Renumbered
Survey Data Support from 14 to 13 to close the gap.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 9210aeb9e040fc10eac10b7f914daa760a404b3f


Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings.

Executive Summary

  • Affected method: BaconDecomposition (Goodman-Bacon 2021). This PR does not change estimator math; it commits R-parity goldens, activates parity tests, and updates methodology tracking/docs.
  • Re-review: the prior tracker-hygiene P3 is resolved. Bacon is now marked **Complete** and the priority list is renumbered correctly in METHODOLOGY_REVIEW.md:L79-L82 and METHODOLOGY_REVIEW.md:L1218-L1225.
  • The previously concerning always-treated boundary/convention gap is now explicitly documented in the authoritative surfaces: docs/methodology/REGISTRY.md:L2617-L2620, docs/methodology/REGISTRY.md:L2683-L2687, and diff_diff/bacon.py:L473-L486.
  • The new parity surface is scoped correctly: tests/test_methodology_bacon.py:L356-L549 keeps direct per-component parity on the unaffected timing-only rows of always_treated_remapped and separately asserts the R-to-Python U-bucket fold-back by cohort.
  • One minor P3 remains: the updated example comment in diff_diff/bacon.py:L1313-L1320 states the per-component parity condition with the wrong inequality relative to the library’s own first_treat <= min(time) remap rule.

Methodology

  • No findings. The PR’s methodology-facing changes are documentation and validation only, and the R-vs-Python always-treated difference is now explicitly documented rather than left implicit.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Removing the Bacon TODO row is supported by the committed golden fixture in benchmarks/data/r_bacondecomp_golden.json and the active parity coverage in tests/test_methodology_bacon.py:L301-L549.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: diff_diff/bacon.py:L1313-L1320 says direct per-component R parity holds when first_treat is “bounded below by min(time).” Under the library’s documented rule, the carve-out is exactly the first_treat == min(time) cohort, so this wording reverses the boundary condition and can mislead readers about when the always-treated convention divergence applies. Concrete fix: rephrase to “when no cohort has first_treat <= min(time) other than never-treated sentinels” or simply “when there are no always-treated / first-period-treated cohorts.”
  • Execution note: I could not run tests/test_methodology_bacon.py locally because this environment does not have pytest or pandas installed.

R9 verdict was Looks good with 1 P3 informational item: the
bacon_decompose docstring example said per-component parity holds
"when first_treat is bounded below by min(time)" — that's the
inverse of the correct condition. Under the library's `<= min(time)`
remap rule, always-treated panels are exactly the ones WITH at least
one non-sentinel `first_treat <= min(time)`, so per-component parity
holds when all non-sentinel cohorts have `first_treat > min(time)`
(i.e. bounded *above* by min(time), or equivalently "no first-
period-treated cohorts"). Rephrased.

Tests: 34/34 pass; no behavior change (docstring-only fix).
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 16, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e592a5b4154791b8356d3756734961bb253a406d


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review: the prior P3 about the bacon_decompose example’s boundary wording is resolved in diff_diff/bacon.py (L1312-L1322).
  • Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes validation, registry/tracker status, and documentation; it does not change estimator math or inference code.
  • The always-treated / first-period-treated behavior is now explicitly documented as a deviation rather than left implicit, in docs/methodology/REGISTRY.md (L2686-L2687) and METHODOLOGY_REVIEW.md (L946-L949), so it is non-blocking under the review rubric.
  • The new parity surface is materially stronger: aggregate parity across all 3 fixtures, direct per-component parity on unaffected rows, and a dedicated always-treated fold-back check in tests/test_methodology_bacon.py (L481-L549).
  • Two minor P3s remain in the test/fixture surface: the committed golden metadata overstates full per-component parity, and the new fold-back selector is less version-robust than the adjacent classifier.
  • Execution note: I could not run the Bacon parity tests locally because this environment is missing pandas.

Methodology

  • No findings. BaconDecomposition is the affected method, and the PR’s only methodology-facing deviation, the inclusive first_treat <= min(time) remap, is explicitly labeled as a deviation in docs/methodology/REGISTRY.md (L2686-L2687) and mirrored in diff_diff/bacon.py (L473-L486).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. Impact: the new fold-back test selects R rows using case-sensitive literal substrings at tests/test_methodology_bacon.py (L520-L526), even though the neighboring classifier at tests/test_methodology_bacon.py (L373-L399) already handles cross-version bacondecomp label variation. That makes future regeneration of the committed goldens more brittle than the rest of the parity harness. Concrete fix: reuse the normalized classifier or lowercased semantic matching when selecting the R-side untreated/always-treated rows for fold-back aggregation.

Tech Debt

  • No findings. Removing the Bacon TODO row is supported by the committed goldens and active parity coverage.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the committed golden metadata still says the parity target is direct per-component (treated, control, type) tuple parity plus TWFE coefficient at benchmarks/data/r_bacondecomp_golden.json (L6), but this PR intentionally replaces full per-component parity on the always_treated_remapped U-bucket rows with aggregate and fold-back assertions. That can misstate what the fixture is supposed to prove. Concrete fix: update the meta.description to describe the actual contract: aggregate TWFE/weight-sum parity across all fixtures, direct per-component parity on non-remap rows, and cohort-level fold-back parity for the always-treated U bucket.

…meta

Fresh R10 verdict was Looks good with 2 P3 informational items:

1. P3 (Maintainability): the always-treated fold-back test selected
   R rows via case-sensitive literal substrings ("Untreated",
   "Always Treated", "Later"), while the neighboring _classify_r_type
   classifier uses case-insensitive semantic matching. Made the
   selector consistent — case-insensitive matching on "untreated" /
   "never" / "always" tokens, so the fold-back survives bacondecomp
   label variation across versions.

2. P3 (Documentation/Tests): committed golden JSON's meta.description
   still advertised full per-component (treated, control, type) tuple
   parity as the contract, but PR #457 intentionally replaces that for
   the always_treated_remapped U-bucket rows with aggregate +
   fold-back parity. Updated meta.description to describe the actual
   three-tier contract (aggregate / direct per-component on
   non-remap + 6 timing-vs-timing rows / cohort fold-back for U
   bucket) with a pointer to the REGISTRY Notes that document the
   convention divergence.

Tests: 34/34 still pass.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a86498ee744c56dcd932484f4a9028f657c362d7


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review: the prior P3s are resolved. The committed golden metadata now describes the narrowed always-treated contract in benchmarks/data/r_bacondecomp_golden.json:L2-L6, and the fold-back selector is now semantic/case-insensitive in tests/test_methodology_bacon.py:L521-L530.
  • Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity fixtures, tests, and methodology/tracker documentation; it does not change estimator math, weights, variance, or inference code.
  • Methodology-wise, the only behavior difference discussed here is the already-implemented first-period boundary extension (first_treat <= min(time)), which is now explicitly documented as a deviation in docs/methodology/REGISTRY.md:L2686-L2687 and mirrored in diff_diff/bacon.py:L473-L486, so it is non-blocking under the review rubric.
  • The R-parity surface is materially stronger: aggregate parity on all three fixtures, direct timing-row parity on the unaffected always_treated_remapped rows, and a dedicated fold-back assertion for the U-bucket divergence in tests/test_methodology_bacon.py:L441-L458 and tests/test_methodology_bacon.py:L481-L554.
  • Execution note: this was a static review only; python -m pytest and import pandas both fail in this environment.

Methodology

No findings. The changed registry/review text is consistent with the paper review’s footnote-11 distinction (t_i < 1) and with the existing inclusive remap implementation documented in diff_diff/bacon.py:L473-L486 and docs/methodology/REGISTRY.md:L2617-L2621,L2683-L2688.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • Severity: P3 [Newly identified]. Impact: the committed golden file and registry now describe the narrowed always_treated_remapped contract, but the advertised regeneration script still documents the old full per-component parity contract and still writes the old generic meta.description. Re-running benchmarks/R/generate_bacon_golden.R would reintroduce contradictory provenance for the same fixture set. References: CHANGELOG.md:L11-L13, benchmarks/data/r_bacondecomp_golden.json:L2-L6, versus benchmarks/R/generate_bacon_golden.R:L8-L12, benchmarks/R/generate_bacon_golden.R:L20-L22, benchmarks/R/generate_bacon_golden.R:L196-L200, and benchmarks/R/generate_bacon_golden.R:L221-L225. Concrete fix: update the script header/comments and the meta.description template to match the current contract: aggregate parity across all three fixtures, direct per-component parity only on the non-U rows of always_treated_remapped, and cohort-level fold-back parity for the U bucket.

…late

R11 verdict was Looks good with 1 P3 informational item: I had updated
the committed JSON's meta.description in R10 to describe the narrowed
contract, but the R generator script at benchmarks/R/generate_bacon_golden.R
still had the old "atol=1e-6 on per-component (treated, control, type)
tuples plus TWFE coefficient" description in BOTH (a) its header
docstring (lines 8-22) AND (b) its meta.description value template
(lines 218-225). Re-running the script would have overwritten my
committed JSON polish with the old contradictory description.

Updated both surfaces to the three-tier contract: (1) aggregate
TWFE + weights-sum on all 3 fixtures; (2) direct per-component parity
on the 2 non-remap fixtures + 6 timing-vs-timing rows of
always_treated_remapped; (3) cohort fold-back parity for the U
bucket on always_treated_remapped. Pointers to REGISTRY Note (R
parity convention divergence on always-treated) + Deviation (first-
period boundary extension).

Re-ran the R script; JSON written matches the committed text and tests
remain green (4/4 in TestBaconParityR, 34/34 across the file). Script
is now idempotent on its own committed output.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a202dca3777edf892e0dae7a8202608dba854bf2


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review: the prior parity-provenance P3 is mostly resolved. The committed JSON, activated TestBaconParityR, REGISTRY updates, TODO cleanup, and tracker status now agree on the narrowed always-treated contract.
  • Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity fixtures, tests, and methodology documentation only; it does not change estimator math, weights, variance, inference, or defaults.
  • The first_treat <= min(time) boundary extension is now explicitly documented as a deviation from paper footnote 11 and from R, so it is non-blocking under the review rubric.
  • One informational issue remains: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still describes the old R behavior and wrong fixture counts.
  • Execution note: static review only; I could not rerun the Bacon tests in this environment because pandas is unavailable.

Methodology

  • No findings. Affected method: BaconDecomposition. The new parity carve-out and fold-back test match the documented contract in docs/methodology/REGISTRY.md:L2670-L2688, are mirrored in diff_diff/bacon.py:L473-L495, and are exercised in tests/test_methodology_bacon.py:L356-L554.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R still says R groups first_treat=1 with U and that the fixture has 30 never-treated units, but the current parity contract says R keeps first_treat=1 as a distinct cohort and the code actually builds 25 never-treated units. That leaves the advertised regeneration script internally contradictory, even though the header and committed JSON metadata were updated. References: benchmarks/R/generate_bacon_golden.R:L207-L219, docs/methodology/REGISTRY.md:L2670-L2688, tests/test_methodology_bacon.py:L441-L554, benchmarks/data/r_bacondecomp_golden.json:L2-L6. Concrete fix: rewrite the fixture-3 inline comment to match the committed contract and actual fixture construction: 5 always-treated, 25 never-treated, and R emitting separate Later vs Always Treated rows instead of folding first_treat=1 into U.

R12 verdict was Looks good with 1 P3 informational item: the
fixture-3 inline comment in benchmarks/R/generate_bacon_golden.R
still described the old contract — said R "natively groups first_treat=1
with U" (wrong; R keeps them as a distinct cohort and emits
`Later vs Always Treated`) and said "30 never-treated" (wrong; the
script builds 25 never-treated). The header docstring + meta.description
template were updated in R11, but this inline block-comment slipped.

Rewrote the inline comment to match: (a) the actual fixture
construction (5 always-treated, 25 never-treated, 3 timing cohorts at
times 3/4/5); (b) the correct R behavior (separate cohort, separate
`Later vs Always Treated` rows); (c) pointers to REGISTRY note +
deviation block; (d) what the parity tests carve out vs fold-back.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 86c0389bfd5800985cd7beeaaa164297562f9878


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review outcome: the prior P3 documentation mismatch is resolved. benchmarks/R/generate_bacon_golden.R:L207-L222 now matches the committed fixture and the narrowed parity contract exercised in tests/test_methodology_bacon.py:L441-L554.
  • Affected method: BaconDecomposition / Goodman-Bacon (2021). This PR changes parity goldens, tests, and methodology/docs; it does not change estimator math, weights, variance/SE, identification checks, or defaults.
  • The only methodology divergence discussed here is the already-documented first-period/always-treated convention difference, explicitly labeled in docs/methodology/REGISTRY.md:L2684-L2688 and mirrored in diff_diff/bacon.py:L473-L486, so it is non-blocking under the review rubric.
  • TODO.md cleanup is justified: the committed golden file, active parity tests, and tracker status are internally consistent across benchmarks/data/r_bacondecomp_golden.json, tests/test_methodology_bacon.py:L301-L554, and METHODOLOGY_REVIEW.md:L915-L939.
  • Static review only: I could not rerun the Bacon parity tests in this environment because pytest is not installed.

Methodology

  • Severity: none. Impact: no undocumented methodology mismatch identified. The R/Python always-treated split and the inclusive first-period remap are both explicitly documented deviations/notes in docs/methodology/REGISTRY.md:L2670-L2688 and reflected in diff_diff/bacon.py:L473-L486. Concrete fix: none.

Code Quality

  • Severity: none. Impact: the parity harness is stricter than before: the carve-out now excludes only the U-bucket rows, while the six timing-vs-timing rows remain under direct per-component parity in tests/test_methodology_bacon.py:L441-L479. Concrete fix: none.

Performance

  • Severity: none. Impact: the PR adds fixtures, tests, and documentation only; no production-path performance regression is introduced. Concrete fix: none.

Maintainability

  • Severity: none. Impact: the generator script, committed goldens, registry, tracker, and bacon.py docstrings now describe the same always-treated contract, reducing future drift. References: benchmarks/R/generate_bacon_golden.R:L8-L22, L207-L222; docs/methodology/REGISTRY.md:L2670-L2688; METHODOLOGY_REVIEW.md:L915-L950; diff_diff/bacon.py:L1312-L1322. Concrete fix: none.

Tech Debt

  • Severity: none. Impact: removing the Bacon parity TODO is supported by committed artifacts and active test coverage, so this does not appear to be premature cleanup. References: TODO.md:L73-L90, tests/test_methodology_bacon.py:L301-L554. Concrete fix: none.

Security

  • Severity: none. Impact: changed files contain synthetic benchmark data and documentation only; no secret or PII exposure identified. Concrete fix: none.

Documentation/Tests

  • Severity: none. Impact: the previous re-review P3 is resolved; fixture-3 comments now match the actual fixture counts and R behavior in benchmarks/R/generate_bacon_golden.R:L207-L222. The new fold-back test at tests/test_methodology_bacon.py:L481-L554 also pins the documented always-treated convention divergence instead of relying on a broad skip. Concrete fix: none.

Residual risk: source inspection only; I could not execute tests/test_methodology_bacon.py here because pytest is unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
@igerber igerber merged commit 25d5ed4 into main May 16, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/bacon-r-parity-goldens branch May 16, 2026 23:03
igerber added a commit that referenced this pull request May 16, 2026
…eased] CHANGELOG conflict (PR #457 BaconDecomposition R parity goldens)

# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant